We claim: 
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A computer-implemented method of text equivalencing from a string of characters 



compnsmg> 



mooift^ing the string of characters using a predetermined set of heuristics; 
comparmg the modified string with a known string of characters in order to locate a 
maith; 

responsive toViot finding a match, forming a plurality of sub-strings of characters 

from the atring of characters; and 
using an informatiori retrieval technique on the sub-strings of characters to determine 

a known string of characters equivalent to the string of characters. 

2. The method of claim 1, wheri^in the information retrieval technique further comprises: 
weighting the sub-strings; 
scoring the known string of characWs; and 

retrieving information associated witns^the known string of characters with the highest 
score. 



3. The method of claim 2, further comprising, responsive to the highest score being 
greater than a first threshold, automatically accepting the kn\wn string of characters as an exact 
match. 
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vA. The method of claim 2, further comprising, responsive to the highest score being less 
than a secWd threshold and greater than a first threshold, presenting the known string of 
characters to\ user for manual confirmation. 

5. The memod of claim 2, further comprising, responsive to the highest score being less 
than a second threshold and greater than a third threshold, presenting the known string of 
characters to a user to selfect the equivalent string of characters. 

6. The method of claim 1, wherein the sub-strings of characters are 3-grams. 

7. The method of claim 1, wherein the string of characters is selected from the group 
consisting of a song title, a song artistW album name, a book title, an author's name, a book 
publisher, a genetic sequence, and a computer program . 

8. The method of claim 1, wherein theVredetermined set of heuristics comprises 
removing whitespace from the string of characters. 

9. The method of claim 1, wherein the predetOTnined set of heuristics comprises 
removing a portion of the string of characters. \ 

10. The method of claim 1, wherein the predeterminetd set of heuristics comprises 
replacing a symbol in the string of characters with an altemateVepresentation for the symbol. 
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\ 1 1 . The method of claim 1 further comprising storing an indication that the string of 
character is the equivalent of the known string of characters. 

12. A^omputer implemented system for text equivalencing from a string of characters 
comprising: \ 

a heuristrcs module for modifying the string of characters using a predetermined set 
of hWistics; 

a comparator module, coupled to the heuristics module, for comparing the modified 
string witla a known string of characters in order to find a match; 

a sub-string formatiVi module, coupled to the comparator module, responsive to not 
finding a matcrk for forming a plurality of sub-strings of characters from the 
string of characters; and 

an information retrieval module, coupled to the sub-string formation module, for 

performing an informarion retrieval technique on the sub-strings of characters 
to determine a known string of characters equivalent to the string of 
characters. \ 

13. The system of claim 12, wherein the inrormation retrieval module further comprises; 
a weight module for weighting the sub-stnngs; 

a score module for scoring the known string\^f characters; and 
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a retrieval module, coupled to the weight and score modules, for retrieving 

information associated with the known string of characters with the highest 
score. 

1 14. The systemvpf claim 13, further comprising an accept module, coupled to the 

2 retrieval module, for acceplbmg the information retrieved as an exact match for the highest score 

3 greater than a first threshold. 
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15. The system of claim 13\ further comprising an accept module, coupled to the 
retrieval module, for presenting the inJbrmation retrieved to a user for manual confirmation for 
the highest score less than a first threshoM and greater than a second threshold. 



16. The system of claim 13, further cWprising an accept module, coupled to the 
retrieval module, for presenting the information Retrieved to the user as a set of options for a user 
to select for the highest score less than a second threshold and greater than a third threshold. 

1 17. The system of claim 12, wherein the sub-strings of characters are 3-grams. 

1 18. The system of claim 12, wherein the string of characters is selected from the group 

2 consisting of a song title, a song artist, an album name, a bool^title, and author's name, a book 

3 publisher, a genetic sequence, and a computer program. 

1 19. The system of claim 12, wherein the predetermined set \^f heuristics comprises 

2 removing whitespace from the string of characters. 
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20. The system of claim 12, wherein the heuristics module comprises a removal module 
for removing a portion of the string of characters. 

21. TheV^stem of claim 12, wherein the heuristics module comprises a replacement 
module for replacing a symbol in the string of characters with an alternate representation for the 
symbol. \ 

22. The system oV claim 12 further comprising a database update module for storing an 
indication that the known string of characters is the equivalent of the known string of characters. 

23. A computer-readable medium comprising computer-readable code for performing 
text equivalencing from a string oftcharacters comprising: 

computer-readable code\ adapted to modify the string of characters using a 
predetermined set o^heuristics; 

computer-readable code adapted to compare the modified string with a known string 
of characters in order to locate a match; 

computer-readable code, responsive to not finding a match, adapted to form a 
plurality sub-strings of characters fi"om the string of characters; and 

computer-readable code adapted to useVan information retrieval technique on the sub- 
strings of characters to determine\a known string of characters equivalent to 
the string of characters. \ 
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1 Y^* The computer-readable medium of claim 23, wherein the information retrieval 

2 technique\further comprises : 

3 computer-readable code adapted to weight the sub-strings; 

4 computer-readable code adapted to score the known string of characters; and 

5 computer-readable code adapted to retrieve information associated with the known 

6 string of characters with the highest score. 

1 25. The computerVeadable medium of claim 24, further comprising computer-readable 

CI \ 

kff code, responsive to the highest score being greater than a first threshold, adapted to automatically 

Hi \ 

43 accept the known string of characters as an exact match. 

■|l \ 

111 26. The computer-readab let medium of claim 24, further comprising computer-readable 

t| core, responsive the highest score being less than a second threshold and greater than a first 

111 \ 

threshold, adapted to present the knownNstring of characters to a user for manual confirmation. 

1 27. The computer-readable mediunvtof claim 24, further comprising computer-readable 

2 code, responsive to the highest score being lesk than a second threshold and greater than a third 

3 threshold, adapted to present the known string ofcharacters to a user to select the equivalent 

4 string of characters. \ 

1 28. The computer-readable medium of claim 2jL wherein the sub-strings of characters are 

2 3-grams. \ 
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^29. The computer-readable medium of claim 23, wherein the string of characters selected 
from a g\pup consisting of a song title, a song artist, an album name, a book title, an author's 
name, a book publisher, a genetic sequence, and a computer program. 

30. The>sComputer-readable medium of claim 23, wherein the predetermined set of 
heuristics comprisesVemoving whitespace from the string of characters. 

31. The computer-readable medium of claim 23, wherein the predetermined set of 
heuristics comprises removmg a portion of the string of characters. 

32. The method of claim 23, wherein the predetermined set of heuristics comprises 
replacing a symbol in the string oftharacters with an alternate representation for the symbol. 

33. The computer-readable medium of claim 23 ftirther comprising updating the known 
string of characters to indicate the stringV)f characters is the equivalent of the known string of 
characters. \ 

34. A computer-implemented system forsperforming text equivalencing from a string of 
characters comprising: \ 

a modifying means for modifying the string, of characters using a predetermined set 

of heuristics; \ 
a comparator means for comparing the modified string with a known string of 

characters in order to locate a match; \ 
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re^onsive to not finding a match, a formation means for forming a plurality sub- 

\^trings of characters from the string of characters; and 
an infomation retrieval means for determining a known string of characters 
equivalent to the string of characters. 

35. The system of clMn 34, wherein the information retrieval means further comprises: 
a weight means for weighting the sub-strings; 
a score means for scoring the known string of characters; and 

a retrieval means for retmeving information associated with the known string of 
characters with the highest score. 
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